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The nucleocapsid phosphoprotein of the severe acute respiratory syndrome coronavirus (SARS-CoV N 
protein) packages the viral genome into a helical ribonucleocapsid (RNP) and plays a fundamental role 
during viral self-assembly. It is a protein with multifarious activities. In this article we will review our 
current understanding of the N protein structure and its interaction with nucleic acid. Highlights of 
the progresses include uncovering the modular organization, determining the structures of the structural 
domains, realizing the roles of protein disorder in protein-protein and protein-nucleic acid interactions, 
and visualizing the ribonucleoprotein (RNP) structure inside the virions. It was also demonstrated that N- 
protein binds to nucleic acid at multiple sites with a coupled-allostery manner. We propose a SARS-CoV 
RNP model that conforms to existing data and bears resemblance to the existing RNP structures of RNA 
viruses. The model highlights the critical role of modular organization and intrinsic disorder of the N pro¬ 
tein in the formation and functions of the dynamic RNP capsid in RNA viruses. This paper forms part of a 
symposium in Antiviral Research on “From SARS to MERS: 10 years of research on highly pathogenic 
human coronaviruses." 
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1. Introduction 

The severe acute respiratory syndrome coronavirus (SARS-CoV) 
nucleocapsid (N) protein is the most abundant protein in the virus- 
infected cells. Its primary function is to package the ~30 kb single 
stranded, 5'-capped positive strand viral genome RNA molecule 
into a ribonucleoprotein (RNP) complex called the capsid. Ribonu- 
cleocapsid packaging is a fundamental part of viral self-assembly 
and the RNP complex constitutes the essential template for replica¬ 
tion by the RNA-dependent RNA polymerase complex. In addition, 
the N-protein of the SARS-CoV has been shown to modulate the 
host cellular machinery and may serve regulatory roles during its 
viral life cycle (Ababou and Ladbury, 2007; Hsieh et al., 2005; Surjit 
et al., 2006). There have been several excellent reviews on the 
coronavirus N protein (Laude and Masters, 1995; Masters, 2006), 
including one on SARS-CoV N protein (Surjit and Lai, 2008). Here 
we will review the recent findings on the structure and function 


of SARS-CoV N and its interaction with nucleic acid from a more 
biophysical point of view. 


2. Packaging of RNP inside the virus 

Coronavirus assembly is localized at membranes of the endo¬ 
plasmic reticulum-Golgi intermediate compartment, likely medi¬ 
ated by species-specific interactions of the matrix (M) protein 
with spike (S), nucleocapsid (N), and envelope (E) proteins (de 
Haan et al., 2000; Krijnse-Locker et al., 1994). However, the de¬ 
tailed molecular packaging of N inside the virion and the interac¬ 
tion between N and other proteins are unknown. Early EM 
studies of coronaviruses have shown that coronavirus RNPs are 
helical, consisting of coils of 9-16 nm in diameter and a hollow 
interior of approximately 3-4 nm (Caul and Egglestone, 1979; Da¬ 
vies et al., 1981; Macneughton and Davies, 1978). More recently, 



Fig. 1 . Structure of SARS-CoV N-protein. (A) 2D electron cryo-microscopy reconstructed image of SARS-CoV particle. (B) Interpretation of the virion structure. Edge view of 
the conserved structural proteins is shown on the left panel and the axial view is shown on the right panel. Trimeric spikes (S) are shaded in red, membrane proteins (M) are 
in solid blue, and nucleoproteins (N) are shaded in violet. The figures are reproduced with permission from Neuman et al. (2006). (C) The modular structural organization of 
SARS-CoV N protein. The domain boundaries shown on the top were defined by Chang et al. (2006a). The ribbon representations of the structures of NTD (green) and CTD 
(blue and gold) are generated with PyMOL from coordinates in the protein data bank (PDB IDs: NTD, 20FX; CTD, 2CJR). The relative orientation of NTD and CTD, as well as the 
conformations of the disordered regions (N-arm, LKR and C-tail), are drawn randomly to reflect the dynamic nature of the N protein, as revealed by SAXS data (Chang et al., 
2009). The ribbon structures were generated using PyMOL (The PyMOL Molecular Graphics System, Version 1.5.0.4 Schrodinger, LLC). 
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Neuman et al. have employed single particle image analysis of 2D 
electron cryo-microscopy (cryo-EM) to investigate the structural 
organization of SARS-CoV at 4 nm resolution (Neuman et al., 
2006) (Fig. 1A). They observed overlapping lattices arranged near 
the viral membrane surrounded by a disordered core. The RNP par¬ 
ticles displayed a coiled shape when released from the viral mem¬ 
brane. Edge views revealed that most of the viral RNP was located 
within —25 nm of the inner face of the membrane. Fifteen-nano- 
meter-wide strands of electron-dense material can be seen emerg¬ 
ing from a spontaneously disrupted SARS-CoV particle. The RNP is 
maintained in a spherically packaged form at the inner face of the 
membrane with no indication of icosahedral symmetry. The SARS- 
CoV nucleocapsid is separated from the envelope by a gap, which 
contains thread-like densities that connect the M protein density 
on the inner face of the viral membrane to a two-dimensionally or¬ 
dered ribonucleoprotein layer (Fig. IB), a feature also seen in TGEV 
(Risco et al., 1996). Since the carboxyl tail of M protein has been 
shown to interact specifically with N (Escors et al., 2001; Kuo 
and Masters, 2002; Narayanan et al„ 2000; Sturman et al., 1980), 
these results suggest that the M-N interactions constrain some N 
molecules in close apposition to the envelope. Glycoprotein spikes 
were found to be aligned with the membrane-proximal layer of 
RNP densities, implying that protein location within the envelope 
is constrained by consistent S-M, M-M, and M-N interactions. 
The organization further implies that M is also organized in a 
two-dimensional lattice which was proposed to be a likely scaffold 
for viral assembly (de Haan et al., 2000). The stoichiometry of the 
unit cell at the virion surface was estimated to be approximately 
1S 3 :16M:4N to 1S 3 :25M:4N proteins, where S 3 is a spike trimer, 
with the remainder of the N protein distributed throughout the vir¬ 
ion core. Nucleoprotein molecules in the paracrystalline RNP shell 
appeared to be partially organized through interactions at points of 
contact in the RNP lattice. The distribution of density in the viral 
core was consistent with a membrane-proximal RNP lattice formed 
by local approaches of the coiled ribonucleoprotein. The cryo-EM 
images did not reveal any internal features within the —25 nm- 
thick RNP zone proximal to the envelope. This suggests that inner 
core densities of mature coronaviruses are not consistently ordered 
with respect to the membrane. A model based on interpretation of 
the 2D cryo-EM data is shown on Fig. IB. 

Roster and associates, on the other hand, employed 3D cryo- 
electron tomography to study the structure of mouse hepatitis 
virus (MFIV) particles (Barcena et al., 2009). They showed that 
the viral envelope has a thickness that is almost twice that of a typ¬ 
ical biological membrane. The extra internal layer was attributed 
to the C-terminal domains of the M protein. In the interior of the 
particles coiled structures and tubular shapes are observed, consis¬ 
tent with a helical nucleocapsid formed by self-association of the N 
protein and the genomic RNA. The RNP seems to be relatively den¬ 
sely packed and disorganized underneath the envelope. Consistent 
with previous observations, they also observed quasi-circular den¬ 
sity profiles approximately 11 nm in diameter enclosing an empty 
space approximately 4 nm in diameter inside the otherwise rela¬ 
tively disorganized interior. The observation of only short coiled 
fragments in the reconstructions strongly suggests that the helical 
nucleocapsid is a very flexible structure that extensively twists and 
folds upon itself, adopting orientations that are not easily recogniz¬ 
able as coils in tomographic sections. The general features and glo¬ 
bal architecture observed for MHV were also observed in TGEV, 
suggesting a general model for the architecture of CoVs. 

The pleomorphic nature of the coronavirus particle has ham¬ 
pered the effort to obtain high-resolution virion image at atomic 
resolution. Nonetheless, the cryo-EM images have provided con¬ 
siderable insights regarding the organization of various structural 
proteins, especially the virion envelope and the RNP. It also re¬ 
vealed a structural plasticity that may play an essential role in 


the virus life cycle. The presence of partially organized and flexible 
N protein regions could facilitate packaging of the genomic RNA by 
CoVs. 


3. Modular organization of the SARS-CoV N protein 

It is perhaps surprising that prior to the outbreak of SARS the 
structure of coronavirus N proteins were never studied in detail. 
The earliest structural model of coronavirus N protein was pro¬ 
posed by Parker and Masters (Masters, 1992; Parker and Masters, 
1990) in the 1990s based on sequence comparison and evolution¬ 
ary studies of MHV, a prototypical Group II coronavirus. In their 
model, the N protein comprised three domains separated by two 
spacers. The central domain acted as the RNA-binding domain, 
whereas the remaining two acidic domains presumably played a 
role in protein-protein interactions. Although the model provided 
a general overview of coronavirus N protein structure at the time, 
it lacked the necessary details to provide a clear picture of the 
structure-function relationship of the protein. 

The SARS pandemic ushered a new era of structural studies on 
coronavirus protein structure. The SARS-CoV N protein is a 46 kDa 
phosphoprotein of 422 amino acids, sharing 20-30% sequence 
identity with the N proteins of other coronaviruses (Marra et al., 
2003; Rota et al., 2003) (Fig. 2). It forms a dimer, which constitutes 
the basic building block of the nucleocapsid, through its C-termi- 
nus (Chang et al., 2005; Surjit et al., 2004; Yu et al., 2005). Huang 
et al. first solved the solution structure of the N-terminal domain 
(Fig. 1C), which they coined as RBD (residues 45-181) and demon¬ 
strated that this domain is capable of binding to RNA with micro¬ 
molar affinity (Huang et al., 2004b). The term RBD is misleading 
since RNA binds to N at multiple sites other than RBD. To avoid 
confusion we will use the acronym, NTD, from now on instead. 
The structure of the dimerization domain (residues 248-365) 
was solved by X-ray crystallography and NMR (Chen et al., 2007; 
Takeda et al., 2008; Yu et al., 2006) (Fig. 1C). Since the dimerization 
domain is not just a dimerization domain and it also binds to nu¬ 
cleic acid we refer it as CTD instead. As shown by NMR, chromatog¬ 
raphy, and small-angle X-ray scattering (SAXS), the NTD and CTD 
forms two independent domains that do not interact with each 
other (Chang et al., 2006). It was evident at this point that the ori¬ 
ginal three-domain model would require extensive revision in light 
of these new developments. 

The modular organization of SARS-CoV N was further defined in 
more detail by a combination of bioinformatics and biophysical 
methods by Chang et al. who showed that the two structural do¬ 
mains are interspersed by intrinsically disordered regions (IDRs) 
that account for —40% of the amino-acid residues (Fig. 1C) (Chang 
et al., 2006, 2009). A relatively new concept in structural biology, 
intrinsically disordered proteins (IDPs) or IDRs lack a defined ter¬ 
tiary structure in the native state, but play important roles in bio¬ 
logical processes, particularly in macromolecular interactions 
(Dunker et al., 2001; Dyson, 2011, 2012; Dyson and Wright, 
2005; Xie et al., 2007). In the case of SARS-CoV N protein, all three 
IDRs (residues 1-44,182-247, and 366-422) are able to modulate 
the RNA-binding activity of the NTD and CTD (Chang et al., 2009). 
The middle IDR, which we coined LKR, and C-terminal IDR have 
both been implicated in the oligomerization of the N protein (He 
et al., 2004a; Luo et al., 2006). The LKR includes a Ser/Arg-rich re¬ 
gion that contains a number of putative phosphorylation sites, 
which may regulate N protein function (Peng et al., 2008; Surjit 
et al., 2005; Wu et al., 2009) and N-M interaction (He et al., 
2004b). Based on these new findings, Chang et al. proposed a struc¬ 
ture-based domain arrangement for SARS-CoV N protein where the 
NTD and CTD are sandwiched between three IDRs. Sequence align¬ 
ments suggested that other coronavirus N proteins might share the 
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SARS-CoV 

IBV 

MHV 

HCoV OC43 
MERS-CoV 


-MSDNGPQSNQRSAPRITFGGPTDSTDNNQNGGRNGARPKQRRP-QGLPNNTASWFTALTQHGK-EELRFPRGQGVPINTNSGPDDQIGXYRRATRR-VR 96 

-MASGKAAGKTDAPAPVIKLGGPKPPKVGSSGNASWFQAIKAKKLNTPPPKFEGSGVPDNENIKPSQOHGYWRRQAR—FK 78 

MSFVPGQENAGGRSSSVNRAGNGILKKTTWADQTERGPNNQNRGRR-NQPKQTATT-QPNSGSWPHYSWFSGITQFQKGKEFQFAEGQGVPIANGIPASEQKGYWYRHNRRSFK 113 

MSFTPGKQSS-SRASSGNRSGNGIL-KWADQSDQVRNVQTRGRR-AQPKQTATS-QQPSGGNWPYY8WFSGITQFQKGKEFEFVEGQGPPIAPGVPATEAKGYWYRHNRGSFK 110 

-MASPAAPRAVSFADNNDITNTNLS-RGRGRNPKP-RAAPNNTVSWYTGLTQHGK-VPLTFPPGQGVPLNANSTPAQNAGYWRRQDRK-IN 86 


SARS-COV 

IBV 

MHV 

HCOV OC43 
MERS-COV 


GGDGKMKELSPRWYFYYLGTGPEASLPYGANKEGIVWVATEGALNTPKDHIGT 
PGKGGRKPVPDAWYFYYTGTGPAADLNWGDTQDGIV VAAKGADTK5R5NQGT 
TPDGQQKQLLPRWYFYYLGTGPHAGASYGDSIEGVFWVANSQADTNTRSDI'VE 
TADGNQRQLLPRWYFYYLGTGPHAKDQYGTDIDGVY..VASNQADVNTPADIV 
TGNG-IKQLAPRKYFYYTGTGPEAALPFRAVKDGIVWVHEDGATDAPS-TFGT 


NNNAATVLQLPQGTTLP-KGFYAEGS-RGGSQASSRSSSRSRGNSRNSTPGSSRGNSPAR 210 

DKFDQYPLRFSDGGPDGNFRWDFIPLHRGRSGRSTAA8SA—AASRAPSREGSRGRRSD- 191 

SSHEAIPTRFAP&TVLP-QGFYVEGS-GRSAPAS-R—SGjSRSQjSRGPN--NRARSSSNQ 222 

SSDEAIPTRFPPfifcpVLP-QGYYIEGS-GRSAPNS-R--STfflksBFtASSAGSRSRANSGN 221 

NNDSAIVTQFAPGTKLP-KHFHIEGT-GGNSQSSSRASSVSRNSSRSSSQGSRSGNSTRG 198 


SARS-COV 

IBV 

MHV 

HCOV OC43 
MERS-COV 


MA-SGGGETALALLLLDRLNQLESKVSGKG—Q 

-SG-DDLIARAAKIIQDQQKKGSRIT 

RQPA-8T-VK PDMAE EIAALVLAKLGKDAGQ PKQV 

RTPT-SG-VTPDMADQIASLVLAKLGKDATKPQQV 

TSPGPSGIGAVGGDLLYLDLLNRLQALESGKV--KQSQPKVIT 


KSAA-EASKKPRQKRTATKQYNVTQAFGRRGPEQTQGNFGDQDLIRQGTDYKHWPQIAQFAPSASAFFGMSRI 321 

-EMAHRRYCKRTIPPNYRVDQVFGPRTKGK-EGMFGDDKMNEEGIKDGRVTAMLNLVPSSHACLFGSRV 287 

QSAKEVRQKILNKPRQKRTPNKQCPVQQCFGKRGPNQ-MPGGSEMLKLGTSDPQFPILAELAPTVGAFFFGSKL 330 

HTAKEVRQKILNKPRQKRSPNKQCTVQQCFGKRGPNQ-KPGGGEMLKLGTSDPQFPILAELAPTAGAFFFGSRL 329 

KDAA-AAKNKMRHKRTSTKSFNMVQAPGLRGPGDLQGNPGDLQLNKLGTEDPRWPQIAELAPTASAFMGMSQF 312 


SARS-COV 

IBV 

MHV 

HCoV OC43 
MERS-COV 


GMEVTP-SGTWLTYHGAI KLDDKDPQFKDNVILLNKHIDAYKTFPPTEPKKDKKKKTD-EAQPLP-QRQKKQPTVTLLPAADMDDFSRQL 408 

TPKLQL-DGLHLRFEFTTVVPCDDPQFDNYVKICDQCVDGVGTRPKDDEPKPKSRSSSRPA-TRGNSPAPRQQRPKK-EKKLKKQDDEADK ALTS DEE-R 383 

ELVK—KNSGGADEPTKDVYELQYSGAVRFDSTLPGFETIMKVLNENLNAYQKDGGADWSPKPQRKGRRQAQEKKDEVDHVSVAKPKSSVQRNVSR-ELTPEDRSLLAQILDDGV V 444 

ELAKVQNLSGNPDEPQKDVYELRYNGAIRFDSTLSGFETIMKVLNENLNAYQQQDGMMNMSPKPQRQRGH--KNGGGENDNISVAVPKSRVQQNKSR-ELTAEDISLLKKMDEPY-T 442 

KLTHQN-NDDHGNPVYFLRYSGAIKLDPKNPNYNKWLELLEQNIDAYKTFPKKEKKQKAPKEES-TDQMSEPPKEQRVQGSITQRTRTRPSVQPGPMIDVNTD- 413 


SARS-CoV QNSMSGASADSTQA- 

IBV HNAQLEFYDEPKVINWGDAALGENEL 

MHV PD-GLEDDSNV- 

HCoV OC43 E-DTSEI- 

MERS-COV - 


422 

409 

454 

448 

413 


Fig. 2. Multiple sequence alignments of coronavirus N proteins. Shaded positions represent conserved residues among the compared sequences. Residues in red denote 
aromatic residues that are postulated to be involved in base stacking interactions when binding to RNA. Secondary structure elements based on SARS-CoV N protein are 
shown on top of the alignment, with arrows and cylinders representing (3-strands and ot-helices, respectively. The alignment was calculated on the ClustalOmega server 

(http://www.ebi.ac.uk/Tools/msa/clustalo). 


same structural organization based on intrinsic disorder predictor 
profiles and secondary structure predictions (Fig. 2). Determina¬ 
tion of the NTD and CTD structures of the N proteins from infec¬ 
tious bronchitis virus (IBV) (Fan et al., 2005; Jayaram et al., 
2006), MHV (Grossoehme et al., 2009; Ma et al., 2010) and human 
coronavirus OC43 (Chen et al., 2013) were in general agreement 
with the structure-based model. The N protein sequence of the re¬ 
cently discovered Middle-East respiratory syndrome coronavirus 
(MERS-CoV) also shares the same intrinsic disorder and secondary 
structure profile, which further supports the universality of the 
structure-based model (van Boheemen et al., 2012). 

Although the original three-domain model has been partially 
superseded by the structure-based model, some features of the 
earlier model may be reconciled with the latter one. First, the sec¬ 
ond spacer and the C-terminal acidic region in the three-domain 
model can be mapped to the C-terminal IDR in the structure-based 
model. Similar to the SARS-CoV N protein, the C-terminal acidic re¬ 
gion of MHV N protein has been shown to self-interact (Hurst et al., 
2005), and it has also been reported that a C-terminal IDR in the N 
protein of human coronavirus strain 229E is involved in oligomer¬ 
ization (Lo et al., 2013). Second, the RNA-binding domain in the 
three-domain model could be re-defined to span both the NTD 
and the CTD. In fact, Hurst et al. noticed that effective binding to 
RNA by MHV N protein in host cells required the presence of both 
the NTD and CTD (Hurst et al., 2009), suggesting that the NTD and 
CTD formed a single bipartite RNA interaction site, a feature to be 
reiterated in the final SARS-CoV RNP model. In this regard, the 
structure-based model is an evolution of the original three-domain 
model that provides a more refined framework for linking the 
structure and function of coronavirus N proteins. 

Modular structures are found in many RNA-binding proteins, 
including other viral nucleocapsid proteins (Draper, 1999; Lunde 
et al., 2007). For example, the nucleocapsid protein from bum- 
yamwera virus is a single-stranded RNA-binding protein with 


two modular domains (Li et al., 2013). Constructing a protein with 
a modular architecture confers many advantages which would not 
be possible with single-domain proteins. These include: (i) En¬ 
hanced binding specificity and affinity through cooperative cou¬ 
pled allosteric binding of individual domains. The modular 
organization of a protein also allows it to present a large and flex¬ 
ible surface for binding to complex structural features, or multiple 
and extended regions of the target molecules such as RNAs. (ii) 
Facilitated regulation and functional expression. The relatively 
weak interactions of individual domains make it easier to regulate 
the formation and disassembly of RNP complexes when needed. 
Assembly and disassembly can proceed via the (un)zipping action 
of one module at a time with moderate free energy cost, (iii) The 
multiple binding sites can evolve independently, and thus enhance 
environmental adaptation. The modular nature of SARS-CoV N pro¬ 
tein and N proteins from Coronaviridae in general, is clearly essen¬ 
tial for packaging RNP and viral function. 


4. Structure of SARS-CoV N protein 

4.1. Structure of the N-terminal domain 

The structure of the NTD of SARS-CoV N was first determined by 
NMR by Huang et al. (2004b). The protein adopts a unique five- 
stranded antiparallel (3-sheet with the topology of (34—p2—p3—pi — 
p5 (Fig. 3 A). The middle strands (32 and (33 are connected by a pro¬ 
truding p-hairpin (p2'—p')- The residues in the extended p-hairpin 
are predominantly basic with 5 of the 15 residues being arginines 
or lysines. The 3D folding created a positively charged pocket at the 
junction between the hairpin and the core structure which served 
as the RNA binding site, as confirmed by NMR chemical shift per¬ 
turbation upon addition of a 16-mer or 32-mer RNA (Fig. 3B). 
NMR relaxation and heteronuclear NOE data indicate that the p- 
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(A) CoV NTD 



(B) SARS-CoV 



(C) SARS-CoV 



Fig. 3. Structure comparisons of coronavirus N-proteins. (A) Structure comparison of various coronavirus NTDs (Grey: SARS-CoV, 20FX; Magenta: IBV, 2GEC; Blue, MHV, 
3HD4; Cyan: HCoV OC43. 4J3 K). The surface charge distributions on (B) (SARS-CoV) and (D) (IBV, MHV and HCoV OC43) are shown in same orientations. (C) Spatial 
arrangement of aromatic residues in NTD speculated to be involved in base stacking interaction when binding to RNA. Residues in the loop connecting p3 and p4 strands (a.a. 
Glyll5-Glyl30) have been removed for clarity. (E) Superimposition of the CTD structures of SARS CoV (gold, 2CJR) and IBV (cyan, 2GEC). The corresponding surface charge 
distributions were shown on (F) and (G) for SARS-CoV and IBV, respectively. All structures and surface charge distributions were generated using PyMOL. 


hairpin is highly flexible, suggesting that this region may undergo 
conformational adaptation upon RNA binding (Clarkson et al., 
2009). The structural features of the NTD are reminiscent of the 
p-sheet RNA recognition proteins found in many RNPs (Draper, 
1999). This class of proteins has a potppotp fold in which the middle 
first and third p-strands contain characteristic aromatic residues. 
In the crystal structure of U1A-RNA hairpin complex three bases 
are stacked against conserved aromatic residues while a flexible 
long p-hairpin grasp the RNA against the p-sheet (Oubridge et al., 
1994). These aromatic residues are thought to orient bases on 
the protein surface, rather than select particular protein-RNA se¬ 
quences. In SARS-CoV NTD there are also many conserved aromatic 
residues in the same structural region. Although not confirmed, it 
is probable that some of these aromatic residues in SARS-CoV N, 
in particular Tyr87, TyrllO, Tyrll2, Tyrll3, Tyrl22, and Trpl33, 
are on the same face of p-sheet and are conserved in coronaviruses 
and may play similar roles in RNP packaging (Fig. 3C).The NTD 
structure of SARS-CoV N was later on determined by X-ray crystal¬ 
lography in two crystal forms (Saikatendu et al., 2007). The overall 
folding of the crystal structure is similar to that observed in solu¬ 
tion by NMR, with a root mean square deviation (RMSD) of 2.6 A 
over 112 superimposed Coe atoms of the monoclinic form. Signifi¬ 
cant inward shift of loops LI and L3 and outward hinge motion 
of the p-hairpin were observed, resulting in the RNA-binding cleft 
being significant narrower and shallower in the crystal structure. It 
is not clear whether the difference is due to the insufficient NOE 
constraints in the solution structure or due to crystal packing or 
both. Nonetheless, the difference observed in the two structures 
further supports the concept that the RNA-binding cleft is 


deformable and is likely to adopt a different conformation upon 
RNA binding. Intriguingly, in the cubic form the individual mono¬ 
mers organized as trimeric units and the consecutive trimers stack 
in a right-handed twist, resulting in an overall packing of a helical 
tubule. At present the physiological relevance of the helical pack¬ 
ing is unclear. 

4.2. Structure of the C-terminal structural domain 

The C-terminal structural domain (CTD) of SARS CoV N exists in 
dimeric form (Chang et al., 2005, 2006; Yu et al., 2005). The crystal 
structure of CTD was solved in two different constructs, CTD 270-370 
(Yu et al., 2006) and CTD 2 48-365 (Chen et al., 2007). Alignment of 
176 corresponding Cot atoms showed a RMSD of 0.511 A, indicating 
that these two structures are practically identical. However, the 
absence of the N-terminal 22 amino acid peptide between residues 
248 and 269 in CTD 270 _3 7 o significantly diminished the protein- 
protein interaction and crystal packing, as well as its interaction 
with nucleic acids, as described below. Each CTD monomer is com¬ 
posed of eight o(-helices and a p-hairpin in the following topology: 
otlot2o(3ot4a5ot6pip2a7o(8 (Fig. 3E). The dimer has the shape of a 
rectangular slab in which the four-stranded p-sheet forms one face 
and the oc-helices form the opposite face. The two C termini are lo¬ 
cated at the diagonal apices on the p-sheet face and the two N ter¬ 
mini are located at the center of two opposing edges of the slab. 
The dimerization interface of the CTD dimer is composed of four 
p-strands and six ot-helices with each protomer contributing one 
p-hairpin and helices cx5, ot6 and ot7. The long p2 strand of one pro¬ 
tomer pairs with the p2 strand of the other protomer to form the 
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four-stranded intermolecular antiparallel p-sheet that is stabilized 
through extensive hydrogen bonding (Chang et al., 2005). Each 
hairpin also interacts extensively with the hairpin from the other 
protomer in a domain-swapped manner. The other part of the 
dimerization interface is composed of helices a5 and 0 ( 6 , where 
strong hydrophobic interactions involving Trp302, Ile305, Pro310, 
Phe315 and Phe316 were observed. The dimer is further stabilized 
by hydrophobic interactions between the longest helix, oc7, and the 
intermolecular p-sheet. The combination of hydrogen bonds and 
hydrophobic interactions results in a very stable dimer with a bur¬ 
ied surface area of ~5280 A 2 , suggesting that the dimer is likely the 
native structure of coronavirus N protein. Takeda et al. have solved 
the solution structure of CTD 2 48-365 and showed that the NMR 
structure is almost identical to the crystal structure. The backbone 
RMSD between the protomers of the mean NMR structure and the 
crystal structure of the CTD spanning residues 248-365 is 1.45 A if 
residues 260-319 and 333-358 are superimposed. However, in the 
NMR structure the two N-termini (residues 248-265) protruding 
from the dimer core are disordered and lack a short helix formed 
by residues 259-263, whereas, in the crystal structure, they are in¬ 
volved in a number of intra-monomer and intra-dimer contacts 
and are more rigid. 

Interestingly, Chang et al. also observed the formation of an oct- 
amer in the asymmetric unit of the CTD crystal. Translational 
stacking of the octamer forms a hollow twin helix structure with 
an outer diameter of ~90 A and an inner diameter of ~45 A, with 
a pitch of —140 A. The groove of the twin helix, which is lined with 
several positively charged residues, has a depth of ~22.5 A. The N- 
terminal 22 amino acid residues from a.a. 248-269 play an impor¬ 
tant role in protein-protein interaction in the octamer, accounting 
for the absence of the octamer in the crystal structure of CTD 27 o- 
370 - Studies of the NMR chemical shift perturbations caused by 
the binding of single-stranded DNA and mutational analyses have 
identified this mostly disordered region at the N-termini as the 
prime site for nucleic acid binding (Takeda et al., 2008). In addition, 
residues in the p-sheet region also showed significant perturba¬ 
tions. Mapping of the locations of these residues onto the helical 
model observed in the crystal structure of CTD 248 _365 revealed that 
these two regions are parts of the interior lining of the positively 
charged helical groove (Fig. 3F). This observation led them to pro¬ 
pose a helical packaging model of SARS-CoV RNP, as will be elabo¬ 
rated in more detail in the following sections. 

Due to difficulties arising from protein stability and dynamic 
behavior, there are no structures available for any of the full-length 
N proteins from coronaviruses. Fitting of the small angle X-ray 
scattering (SAXS) data led Chang et al. to propose a structural mod¬ 
el for a di-domain (DD) construct spanning the NTD, LKR, and CTD 
of SARS-CoV N protein (a.a. 45-265) (Chang et al., 2009) (Fig. 1C). 
The DD dimer adopts a clamp-like open conformation in the model 
with LKRs serve as the two arms connecting the two NTDs to the 
CTD dimer. The model is consistent with the known structural fea¬ 
tures of coronavirus N proteins, namely the dimerization of the 
CTD and intrinsically disordered nature of the LKR, and currently 
remains the only structural model spanning multiple domains of 
coronavirus N proteins. 

4.3. Comparison with N proteins of other coronaviruses 

Comparison of SARS-CoV N protein structure with those of 
other viral N proteins provides valuable mechanistic and evolu¬ 
tionary insights. The NTD from avian infectious bronchitis virus 
(IBV) (Fan et al., 2005) (Jayaram et al., 2006), mouse hepatitis virus 
(MHV) (Grossoehme et al., 2009), and human coronavirus OC43 
(HCOV-OC43) (Chen et al., 2013), as well as the CTD of IBV (Jaya¬ 
ram et al., 2006) have been reported. The sequence identities of 
SARS-CoV NTD with those of IBV and HCoV-OC43 are 34% and 


47%, respectively, yet the 3D structures of these three proteins 
are highly homologous. The RMSD between SARS-CoV N-NTD 
(PDB ID: 20FZ) and IBV-NTD (PDB ID: 2GEC) is 0.665 A for 69 
aligned Ca atoms and that between SARS-CoV NTD and HCoV- 
NTD is 0.838 A for 86 aligned Ca atoms of the two proteins (Chen 
et al., 2013). Interestingly, the surface charge distribution of the 
NDT of IBV, MHV and HCoV-OC43 are significantly different from 
that of the SARS-CoV NTD (Fig. 3D), suggesting that they may inter¬ 
act with RNA differently. The structure of IBV-CTD (a.a. 219-349, 
PDB ID: 2GE7) is also highly homologous to that of SARS-CoV 
CTD (PDB ID: 2CJR). The RMSD of 182 aligned Ca atoms in a dimer 
between the two structures is 1.563 A and both exist as a domain- 
swapped dimer. Three types of interactions (S-, L- and F-types) 
were observed in three forms of IBV CTD crystals. Intriguingly, type 
S interaction observed in crystal form 1 and 2 (Fig. 4A and B in 
Jayaram et al. (2006)) bears high resemblance to that observed in 
the helical packing of SARS-CoV CTD (Fig. 4A). Furthermore, the 
surface charge distribution of IBV CTD dimer also contains a posi¬ 
tively charged strip spanning the region observed for SARS-CoV 
CTD dimer (Fig. 3F and G) implying similar interaction between 
CTD and RNA for the two coronaviruses. 

4.4. The CTD dimer interface suggests possible evolutional link 
between corona- and arteriviruses 

Sequence alignment coupled with secondary structure predic¬ 
tion show that many coronavirus CTDs share the ppa topology ob¬ 
served in SARS-CoV (Chang et al., 2005). These results raise the 
possibility that all coronaviruses employ the same interface mech¬ 
anism for dimerization and they belong to the same structural 
class. The structural arrangement of CTD is also reminiscent of 
the dimer-interface of the nucleocapsid protein from porcine 
reproductive and respiratory syndrome virus (PRRSV), an arterivi- 
rus (Chang et al., 2005). Thus, there are common principles that 
underlie the architecture of the nucleocapsid protein in both 
SARS-CoV and PRRSV. The structural similarity between the N pro¬ 
teins of SARS-CoV and PRRSV provides valuable information for 
understanding the evolutionary links between corona- and arteri¬ 
viruses, suggesting a possible common origin of these two proteins 
(Yu et al., 2006). 

5. Biophysical aspects of SARS-CoV N protein self-association 

5.1. The CTD is a transient self-association site of the SARS-CoV N 
protein 

Reports in the literature suggested that N can oligomerize 
through the SR-rich region or the C-terminal fragments in a con¬ 
centration dependent manner (Surjit and Lai, 2008). However, 
these early studies were carried out using fragments that often 
cut through the structured region that could have adverse effect 
on their structures and oligomerization behavior as well. Crystal 
structures of coronavirus N protein led to several proposed N-poly- 
mers that could bind RNA and mimic the RNP packaging (Chen 
et al., 2007; Fan et al., 2005; Jayaram et al., 2006; Saikatendu 
et al., 2007). It is unclear whether the oligomer structure is biolog¬ 
ically relevant, since there have been no reports of oligomer spe¬ 
cies being detected in solution. To test the possibility that the 
oligomer structure reflects the existence of transient interactions 
that have been trapped during the crystallization process, Chang 
et al. applied an in vitro disulfide trapping technique in an attempt 
to capture these transient interactions in solution (Chang et al., 
2013). Specifically, using the crystal structures as guides they engi¬ 
neered single-site cysteine mutations at various locations and 
tested the ability of these mutants to spontaneously form disulfide 
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Fig. 4. A proposed model of the SARS-CoV ribonucleocapsid protein. The crystal packing of a 24-mer CTD domain is shown in side view (A) and top view (B). The surface 
charge distribution of the SARS-CoV CTD 24-mer. (C) Top view of the model shows the docking of two RNA chains (orange and yellow ribbons) onto the 24-mer CTD structure. 
The CTD 24-mer is shown in surface charge representation. The RNA chains were modeled with the phosphate backbone (red spheres) facing inside the groove and bases 
(yellow rings) pointing outward. (D) Top view of the putative CTD-RNA complex. (E) Schematic of the docking of NTD onto the CTD 24-mer-RNA complex. The NTD domains 
are represented by ellipsoids. 


linkages through size-exclusion chromatography. SARS-CoV N con¬ 
tains no cysteine and none of the mutants are located close enough 
to form intra-dimer disulfide linkages, thus any disulfide linkage 
must be due to inter-dimer protein disulfide bond formation. The 
results suggested that fragments containing the CTD of SARS-CoV 
N protein are capable of transient self-association through the oli¬ 
gomer interface identified in the crystal structure, even though the 
long-lived stable helical structure of CTD was not observed in solu¬ 
tion. Thus, the CTD dimer-dimer interaction observed in the crystal 
is also the preferred interaction in solution but the oligomer can 
form only transiently due to weak interaction as shown in the 
small interface area between CTD dimers (—1000 A 2 ). Presumably 
these weak interactions can be augmented by RNA binding and 
binding of the other N protein domains linked to the CTD through 
LKR in a synergistic manner and the conformation of the CTD oli¬ 
gomer will be further modified by N-RNA interactions. A similar 
strategy was applied to engineer NTD mutants. However, no signif¬ 
icant oligomer formation was observed for the NTD fragments, 
suggesting that the NTD either does not form oligomers or forms 
oligomers through an unidentified intermolecular interface other 
than that identified in the NTD crystal structure. 

5.2. Electrostatic screening and phosphorylation-mimicking mutation 
affect SARS-CoV N protein self-association 

SARS-CoV N is a highly basic protein containing an excess of 25 
positive charges. These charges are considered important for RNA 
binding, but they are also potentially deterring for the self-associ¬ 
ation of the protein through electrostatic repulsion (Huang et al., 


2004b; Takeda et al., 2008). Chang et al. tested whether salt con¬ 
centration affects SARS-CoV N transient self-association by disul¬ 
fide trapping experiment as described above using the Q290C 
mutant. Gln290 is located at the interface between two dimers 
and the two Gln290 in the dimer are far apart, formation of disul¬ 
fide bonds in the Q290C mutant would require at least two dimers 
to draw close together in space, resulting in the formation of tetra- 
mers or higher oligomers. The relative amount of tetramer and lar¬ 
ger oligomers in solution increases with increasing salt 
concentration, suggesting that reducing charge repulsion by 
increasing salt concentration enhances self-association of CTD. 

The N protein is heavily phosphorylated at the Ser/Arg-rich por¬ 
tion of the LKR region (Peng et al., 2008; Surjit et al., 2005; Zak- 
hartchouk et al., 2005) and phosphorylation may affect nucleo- 
cytoplasmic shuttling of the N protein (Surjit et al., 2005; Wu 
et al., 2009). Peng et al. demonstrated that phosphorylation of 
the LKR by the SR protein kinase-1 (SRPK1) partially impaired 
the self-association of the full-length protein (Peng et al., 2008). 
Chang et al. examined whether changing the electrostatic proper¬ 
ties of the protein itself could affect transient self-association 
(Chang et al., 2013). They chose the putative phosphorylation sites 
on the flexible linker as prime target, and assayed the effect of neg¬ 
ative charges on N protein self-association by changing these sites 
from Ser to Glu in the Q290C mutant of di-domain constructs con¬ 
taining the NTD, LKR and CTD (DD^oc. a.a. 45-365). They ob¬ 
served that gradual introduction of negative charges on the 
unstructured linker had a positive effect on the oligomerization 
of the DD when compared to the DDqmoc control, with maximum 
effect achieved when 3 negative charges were introduced per each 
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chain. Further increases in negative charges were less effective in 
enhancing DDq 2 9oc oligomerization. Overall, the results suggest 
that hyperphosphorylation of the LKR, which reduces the total po¬ 
sitive charge of the N protein, can enhance and regulate oligomer¬ 
ization of DD through electrostatic effects. The results suggest a 
biophysical mechanism where electrostatic repulsion may act as 
a switch to regulate N protein oligomerization. 


6. Protein-nucleic acid interaction 

6.1. SARS-CoV N protein binds to nucleic acids at multiple sites 

The primary function of the coronavirus N protein is to package 
the viral genome into a ribonucleoprotein (RNP) particle to protect 
the genomic RNA and for incorporation into a viable virion. Thus, N 
must bind to RNA tightly. During viral infection the N protein must 
also be readily dissociated to expose the genomic RNA for efficient 
expression, transcription and replication (Lai and Cavanagh, 1997; 
Tahara et al., 1994,1998). This function demands a low energy bar¬ 
rier for N to dissociate from RNA. Viruses have evolved a clever tac¬ 
tic to achieve these two seemingly contradictory functions. The 
secret lies on the modular structural organization and the dynamic 
nature of the N protein rendered by the intrinsically disordered re¬ 
gions. Much information about the interaction between the N pro¬ 
tein and RNA in coronaviruses have been gathered through studies 
on MHV model systems, including detection of general binding 
activity (Robbins et al., 1986) and identification of RNA sequences 
that bind with high affinity to the protein (Nelson et al., 2000). 
However, it was the discovery of SARS-CoV that spurred research 
on the mechanisms behind the interaction between coronavirus 
N protein and nucleic acids. Studies on the nucleic acid-binding 
behavior of SARS-CoV N protein at the domain level have started 
to provide much needed insight into the binding mechanism of 
coronavirus N proteins. SARS-CoV N protein is a highly basic pro¬ 
tein with excess positively charged residues mostly localized in 
three regions: the SR-rich region of the LKR (residues 176-204,+6 
charges), the N-terminal region of the CTD (residues 248-267,+7 
charges) and the C-terminal disordered region (residues 370- 
389,+7 charges). The nucleic acid-binding activity of the NTD was 
tested and confirmed early on Huang et al. (2004a) due to the pres¬ 
ence of the classic RNA-binding motif first detected in the Ul-RNP 
(Nagai et al., 1995). The effect of the other structural domain, the 
CTD, on nucleic acid binding was not expected since initial struc¬ 
tures of the domain did not include the residues that interacted 
with nucleic acids (Yu et al., 2006). Structures of longer constructs 
of CTD later revealed a positively charged groove on the surface of 
the molecule that could act as a binding site for nucleic acids (Chen 
et al., 2007), and follow-up studies demonstrated that the CTD was 
capable of binding to both ssDNA and ssRNA with similar affinity 
as the NTD (K D ~ 10 pM) (Chang et al., 2009; Takeda et al., 2008). 
The IDRs, on the other hand, have not been studied individually 
due to stability issues (Mark et al., 2008). However, inclusion of 
the IDRs to any of the structural domains resulted in significantly 
increased binding affinity and binding cooperativity towards a poly 
(U) ssRNA under in vitro conditions (K D ~ 0.8 pM), suggesting that 
the IDRs are able to modulate the nucleic acid binding activity of 
SARS-CoV N protein (Chang et al., 2009). Of particular interest is 
the role of the LKR in N protein-nucleic acid interaction since it 
contains a SR-rich motif where most of the putative phosphoryla¬ 
tion sites are located. It has been reported that SARS-CoV N protein 
is hypophosphorylated within the virion (Wu et al., 2009), and 
deletion of the SR-rich motif within the LKR resulted in formation 
of larger than normal RNPs that were sensitive to RNase treatment 
(Peng et al., 2008). These observations suggest that phosphoryla¬ 
tion of the SARS-CoV N protein at the LKR not only affects N 


oligomerization, it may decrease the nucleic acid binding affinity 
as well. 

By itself, the SARS-CoV N protein is a non-specific nucleic acid¬ 
binding protein. It has been shown to bind to single-stranded RNA 
(ssRNA), single-stranded DNA (ssDNA) and double-stranded DNA 
(dsDNA) under in vitro conditions (Takeda et al., 2008; Tang 
et al., 2005). The non-specific nature of this binding is also ex¬ 
pected since encapsidation of the entire viral genome would re¬ 
quire the N protein to bind to diverse sequences with reasonable 
affinity. Although one could argue that the N protein may bind to 
a particular sequence with high affinity and package the rest of 
the RNA by relying on protein-protein interaction alone, such a 
scenario is unlikely to happen because the interaction between N 
protein dimers is extremely weak in the absence of nucleic acids 
(Chang et al., 2006, 2013). Moreover, the highly charged regions 
are exposed to the solvent and the electrostatic forces might be 
the main driving force behind protein-nucleic acid binding. In¬ 
deed, nucleic acid-binding sites on the NTD and CTD identified 
from NMR studies were found to have strong positive surface 
charges (Huang et al., 2004b; Takeda et al., 2008). Takeda et al. also 
found that mutating Lys257 and Lys258 to Gin in the CTD resulted 
in decreased binding affinity towards ssDNA, whereas mutating 
the same residues to Arg had no effect on the binding strength 
(Takeda et al., 2008). These lines of evidence strongly indicate that 
SARS-CoV N protein binds to nucleic acids in a non-specific manner 
through electrostatic interactions. 

6.2. Intrinsic disorder and coupled-allosteric binding of N to nucleic 
acids 

The discovery that multiple regions within the SARS-CoV N pro¬ 
tein are capable of interacting with nucleic acids provides critical 
insights into the binding mechanism. Although the binding 
strength of individual binding sites towards nucleic acids is only 
in the micromolar range, the concerted action of these sites confers 
higher nucleic acid-binding affinity to the N protein as a whole. The 
IDRs play a special role, since their inclusion not only increases the 
binding affinity, but also enhances the binding allostery, enabling 
the N protein to bind RNA with high cooperativity (Chang et al., 
2009). A variety of functions were found to be associated with do¬ 
mains containing conserved disorder with DNA/RNA binding 
among the most common function (Dunker et al., 2002; Dyson, 
2012). The intrinsic disorder in protein confers several advantages 
in performing its biological functions, including promiscuous basal 
activity, enhanced specificity, higher capture radius for formation 
of complexes, facilitating regulation by post-translational modifi¬ 
cation (Dyson, 2011 ). Two possible causes for the binding enhance¬ 
ment can be argued. First, the extended conformation of the N 
protein due to the presence of the IDRs increases the collision ra¬ 
dius with nucleic acids. If the binding were further coupled to 
changes in protein conformation, the rate of binding would be en¬ 
hanced through the “fly-casting mechanism" proposed by Shoe¬ 
maker et al. (2000). Second, the flexibility of the IDRs allows the 
optimal alignment of the multiple nucleic acid-binding sites to 
interact with the same nucleic acid molecule in an allosteric fash¬ 
ion, resulting in a “coupled allostery" effect that enhances the 
binding affinity of the protein towards the nucleic acid (Hilser 
and Thompson, 2007). In addition to the IDRs, the structural do¬ 
mains, NTD and CTD, also could act in conjunction to enhance 
the binding affinity. The NTD contains a number of aromatic resi¬ 
dues conserved among coronavirus N proteins that may interact 
with nucleotide bases by forming stacking interactions, whereas 
the strong electropositive surface formed by the CTD dimer is per¬ 
fectly suited for interacting with the phosphate backbone (Chen 
et al., 2007). Consistent with this model, mutagenesis studies con¬ 
ducted by Grossoehme and coworkers have found that Tyrl27 on 
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the NTD of MHV N protein was important for binding to a tran¬ 
scriptional regulatory sequence RNA (Grossoehme et al., 2009). 

7. Packaging of the SARS-CoV ribonucleocapsid 

7.3. A putative model 

Accommodation of the exceptionally large (~29 kb) SARS-CoV 
genome into newly formed virion spherules <100 nm in size neces¬ 
sitates an extremely well-packed, largely helical, supercoiling of 
the nucleic acid within the RNP core. The inability to observe a 
well-structured RNP layer inside the SARS-CoV particle and only 
short coiled fragments of RNP in MHV in the cryo-EM reconstruc¬ 
tions strongly suggests that the helical nucleocapsid is a very flex¬ 
ible structure that extensively twists and folds upon itself (Barcena 
et al., 2009). Such a dynamic structure of coronavirus RNP is not to¬ 
tally unexpected since RNA is known to be dynamic and exists in 
multiple folded and unfolded states (Dyson, 2012). Thus, RNA-pro- 
tein recognition often involves an induced-fit process, in contrast 
to protein-B DNA interaction which most often manifests itself 
as molding of the protein onto the B-form DNA structure. Further¬ 
more, the modular organization with three long IDRs of the N pro¬ 
tein provides the N protein with considerable flexibility. No 
existing data supports the presence of a long-lived SARS-CoV N oli¬ 
gomer or intermediate in solution and the SARS-CoV genomic 
ssRNA by itself is unlikely to exist as a helix of the length observed 
in cryo-EM. Thus, packaging of SARS-CoV RNP proceeds most likely 
through a RNA binding-coupled packaging mechanism, as also pro¬ 
posed for MHV, which showed that the product RNA of mouse hep¬ 
atitis virus synthesized was mostly of genome length and was 
found to be encapsidated by N protein (Compton et al., 1987). This 
suggests that coronavirus RNA synthesis is coupled to the encaps- 
idation of nascent RNA, analogous to the replication of viruses with 
helical negative-strand RNA nucleocapsids. Based on available de¬ 
tailed 3D structural information of the SARS-CoV N protein mod¬ 
ules and our understanding of N-RNA interaction we propose a 
probable model derived from the crystal structure of the CTD 
(Chen et al., 2007), which was shown to exist transiently in solu¬ 
tion by disulfide trap experiment (Chang et al., 2013) (Fig. 4A 
and B). A putative scenario of the molecular events leading to the 
formation of RNP is as follows: 

(1) Initiation: In solution initial binding of RNA at either NTD or 
CTD facilitates binding of other modules to RNA in a cou¬ 
pled-allostery manner with RNA molecule threads between 
the two structural domains. This initial N-RNA binary com¬ 
plex (RNP 0 ) is highly stable and each RNA molecule may 
have several N protein bound at a particular time. 

(2) Growth: The RNP 0 could grow by either recruiting more N to 
the adjacent RNA sites, or it could slide or hop along the lin¬ 
ear RNA molecule and combine with other smaller N-RNA 
oligomers to form a larger oligomer (RNP n ) of various sizes. 
The N proteins in RNP n would pack in a structure with CTD 
forming the helical core, similar to that observed in the CTD 
crystal structure, and RNA wraps and twists around the heli¬ 
cal groove through mostly electrostatic interaction between 
the positively charge residues in the groove and the phos¬ 
phate backbone of the RNA molecule (Fig. 4C and D). 

(3) Packaging of NTD: The NTD module will cap on the outside of 
the helical CTD-RNA complex with the charged surface in 
the junction between the fi-sheet and the fi-hairpin covering 
the free phosphate groups of the RNA molecule. Further¬ 
more, RNA bases sticking out of the groove could intercalate 
in between the aromatic rings on the NTD core at the bottom 
of the fi-sheet (Figs. 3C, and 4C). The presence of the long 


disordered LKR permits the two structural domains consid¬ 
erable freedom to adapt a wide range of orientations and 
positions for optimal packing of the RNP complex. Likewise, 
the RNA molecule also possesses high freedom to adjust to 
local conformation by an induced-fit process. Thus, the N 
protein binds to RNA in a fashion resembling that of an octo¬ 
pus clinching onto its prey (RNA) using all its tentacles 
(modules) (Fig. 4E and F). 

(4) Thermodynamic basis: Electrostatic interaction drives the for¬ 
mation of N-RNA complex but the multitude of weak pro¬ 
tein-protein interactions contributes towards the self- 
assembly of the helical RNP. This is consistent with the con¬ 
cept for virus assembly that capsid proteins associate 
through locally weak interactions to form globally stable 
structures (Zlotnick, 2003). 

The RNP structure proposed above would have an outer diame¬ 
ter of ~16 nm and an inner diameter of ~4 nm, consistent with 
that observed by cryo-EM. Each N dimer would bind to 7 RNA 
bases. The two termini would stick out of the helix and the LKR lin¬ 
ker would be accessible to interact with the matrix protein M. The 
combination of a modular structure incorporating IDRs, multiple 
sites of moderate RNA binding affinity, and weak dimer-dimer 
interaction in the N protein not only allows the packaging of a sta¬ 
ble RNP but also offers an energetically favorable condition for the 
expression of the viral genomic information. One can envision an 
unzipping mechanism for unwinding of the viral RNA molecule 
and dissociation of the RNA molecule from the N protein in a step¬ 
wise manner, one module at a time, without the need to overcome 
a high-energy barrier of dissociating a whole N protein at once. The 
weak interactions between N protein dimers also minimize forma¬ 
tion of kinetic traps and allow a greater degree of regulation of RNP 
assembly. 

7.2. Comparison with other viral RNP structures 

At present the structures of several helical viral RNP of RNA 
viruses have been reported. These include rabies virus (RV) (Alber- 
tini et al., 2006), vesicular stomatitis virus (VSV) (Green et al., 
2006), respiratory syncytial virus (RSV) (Tawar et al., 2009), Lassa 
virus (Hastie et al., 2012; Hastie et al., 2011), Rift Valley fever virus 
(RVFV) (Raymond et al., 2012), Bunyamwera virus (BUNV) (Ariza 
et al., 2013; Li et al., 2013) and Leanyer orthobunyavirus (LEAV) 
(Niu et al., 2013). The N proteins of these RNA viruses all possess 
the modular organization similar to that of the SARS-CoV N pro¬ 
tein, namely they all consist of an N-terminal arm, two domains 
which are connected by a flexible hinge, and a flexible C-terminal 
tail. With the exception of Lassa virus, the RNA binds to the posi¬ 
tively charged crevice between the N- and C-terminal domains that 
shield RNA from the environment. Thus, RNA sequestering by 
nucleoproteins is likely a common mechanism used by RNA viruses 
to protect their genomes from host defense mechanism. It also sug¬ 
gests that conformational change in the RNA packing is required 
during expression and translation. The number of RNA bases 
bound per N protein varies from 4 in RVFV to 11 in LEAV. 

8. Future perspectives 

Over the past 10 years considerable insights regarding the 
structure and function of the SARS-CoV N protein have been re¬ 
vealed. It is remarkable that the coronavirus N protein family 
shares a common modular structure organization incorporating 
functionally important IDRs even when they share only moderate 
sequence identity. New biophysical information, together with re¬ 
cent studies employing classical genetics and biochemical meth- 
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ods, have started to provide a clearer picture of how the N protein 
forms the RNP and what factors affect the process. However, the 
quest for understanding how the SARS-CoV N protein (and corona- 
virus N proteins in general) carries out its roles during the viral life 
cycle is still far from over. A critical piece of missing information 
lies in the atomic structure of the RNP complex, whose elucidation 
has been hampered by the low solubility of the complex and labile 
nature of the full-length N protein. It will be probably not enough 
to only obtain the structure of the SARS-CoV RNP, but the determi¬ 
nation of a number of coronavirus RNPs will be necessary to ascer¬ 
tain whether they share a common structural code. 

Another topic is the role of the N protein in the viral replica¬ 
tion-transcription complex (RTC), which is composed of various 
coronavirus nonstructural proteins (Nsp’s). In MHV, the N protein 
has been shown to dynamically associate with the RTC (Verheije 
et al., 2010). Keane and Giedroc recently found that MHV N bound 
to Nsp3 with high affinity through the NTD and LKR (Keane and 
Giedroc, 2013). Co-localization of the N protein with the RTC has 
also been observed in cells infected with SARS-CoV (Stertz et al., 
2007), although whether there is direct physical interaction be¬ 
tween the two remains to be seen. One problem in this field is 
the lack of knowledge on the functions of the individual Nsp’s, 
making it extremely difficult to interpret the biological relevance 
of N-Nsp interactions. The N protein might also associate with 
the RNA-dependent RNA polymerase (RdRp) in coronaviruses 
(van der Meet et al., 1999), but the interaction is poorly defined 
and more effort will be required to verify the association and clar¬ 
ify its role. 

The SARS-CoV N protein has been reported to interact with 
numerous host cell proteins, such as the B23 phosphoprotein (Zeng 
et al., 2008), Smad3 (Zhao et al., 2008), the chemokine Cxcll6 
(Zhang et al., 2010), translation elongation factor-1 alpha (Zhou 
et al., 2008), pyruvate kinase (Wei et al., 2012), and 14-3-3 (Surjit 
et al., 2005). Unfortunately, there have been few follow-up studies 
that independently verify these interactions, and the large variance 
in experimental conditions used to initially identify these interac¬ 
tions makes it extremely difficult to obtain a coherent picture of 
the SARS-CoV N protein interactome in the host cell. On the other 
hand, a recent IBV study employing high-throughput mass spec¬ 
trometry yielded a list of cellular proteins that may potentially 
bind to the N protein (Emmott et al., 2013), and the same strategy 
could be applied to SARS-CoV and other coronaviruses (especially 
MERS-CoV) for interactome mapping. Comparisons between differ¬ 
ent coronavirus N protein interactomes should provide valuable 
information on host specificity and evolution of the interactions 
between N and host cell proteins, and may offer insight into the 
development of antiviral agents against coronaviruses that target 
interactions between host cell proteins and the N protein. 

The SARS-CoV N protein has been widely used as a diagnostic 
target of SARS (Surjit and Lai, 2008). Viral N protein shows least 
variation in the gene sequence, therefore indicating it to be a 
genetically stable protein, which is a primary requirement for an 
efficient drug target candidate. Given its pathogenic effect inside 
the cell, it is not surprising that the N protein has also become a 
therapeutic target in antiviral therapy. Disruption of RNP forma¬ 
tion through inhibition of either protein oligomerization or nucleic 
acid binding activity of nucleoproteins has been effective in the 
inhibition of other viruses under a laboratory setting. For example, 
nucleozin and its analogues were shown to inhibit influenza virus 
by targeting its nucleocapsid protein (Hung et al., 2012; Kao et al., 
2010), and compounds targeting the interaction between N protein 
and nucleic acids have been developed against HIV-1 (Musah, 
2004). Recently, Lo et al. discovered an antiviral peptide that inter¬ 
fered with the CTD oligomerization of the HCoV-229E N protein 
and inhibited HCoV production (Lo et al., 2013). Extending these 
studies to SARS-CoV and other novel human coronaviruses, e.g. 


MERS-CoV, could pave the way towards the discovery of new ther¬ 
apeutics that target the N protein. 
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